NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Easy and accurate protein structure prediction using ColabFold

https://doi.org/10.21203/rs.3.pex-2490/v1

Kim, Gyuri; Lee, Sewon; Karin, Eli Levy; Kim, Hyunbin; Moriwaki, Yoshitaka; Ovchinnikov, Sergey; Steinegger, Martin; Mirdita, Milot (December 2023, Research Square)

Abstract Since its public release in 2021, AlphaFold2 (AF2) has made investigating biological questions, using predicted protein structures of single monomers or full complexes, a common practice. ColabFold-AF2 is an open-source Jupyter Notebook inside Google Colaboratory and a command-line tool, which makes it easy to use AF2, while exposing its advanced options. ColabFold-AF2 shortens turn-around times of experiments due to its optimized usage of AF2’s models. In this protocol, we guide the reader through ColabFold best-practices using three scenarios: (1) monomer prediction, (2) complex prediction, and (3) conformation sampling. The first two scenarios cover classic static structure prediction and are demonstrated on the human glycosylphosphatidylinositol transamidase (GPIT) protein. The third scenario demonstrates an alternative use-case of the AF2 models by predicting two conformations of the human Alanine Serine Transporter 2 (ASCT2). Users can run the protocol without command-line knowledge via Google Colaboratory or in a command-line environment. The protocol is available at https://protocol.colabfold.com.
more » « less
Full Text Available
ColabFold: making protein folding accessible to all

https://doi.org/10.1038/s41592-022-01488-1

Mirdita, Milot; Schütze, Konstantin; Moriwaki, Yoshitaka; Heo, Lim; Ovchinnikov, Sergey; Steinegger, Martin (June 2022, Nature Methods)

Abstract ColabFold offers accelerated prediction of protein structures and complexes by combining the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold’s 40−60-fold faster search and optimized model utilization enables prediction of close to 1,000 structures per day on a server with one graphics processing unit. Coupled with Google Colaboratory, ColabFold becomes a free and accessible platform for protein folding. ColabFold is open-source software available at https://github.com/sokrypton/ColabFold and its novel environmental databases are available at https://colabfold.mmseqs.com .
more » « less
Full Text Available
DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options

https://doi.org/10.1093/nar/gkad985

Basu, Sushmita; Zhao, Bi; Biró, Bálint; Faraggi, Eshel; Gsponer, Jörg; Hu, Gang; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Söding, Johannes; et al (November 2023, Nucleic Acids Research)

Abstract The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
more » « less
Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank

https://doi.org/10.1186/s13059-020-02023-1

Steinegger, Martin; Salzberg, Steven L. (December 2020, Genome Biology)

Full Text Available
Evolutionary balance between foldability and functionality of a glucose transporter

https://doi.org/10.1038/s41589-022-01002-w

Choi, Hyun-Kyu; Kang, Hyunook; Lee, Chanwoo; Kim, Hyun Gyu; Phillips, Ben P.; Park, Soohyung; Tumescheit, Charlotte; Kim, Sang Ah; Lee, Hansol; Roh, Soung-Hun; et al (July 2022, Nature Chemical Biology)

Full Text Available
DescribePROT: database of amino acid-level protein structure and function predictions

https://doi.org/10.1093/nar/gkaa931

Zhao, Bi; Katuwawala, Akila; Oldfield, Christopher J; Dunker, A Keith; Faraggi, Eshel; Gsponer, Jörg; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Obradovic, Zoran; et al (October 2020, Nucleic Acids Research)
null (Ed.)
Abstract We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.
more » « less
Full Text Available
HFSP: high speed homology-driven function annotation of proteins

https://doi.org/10.1093/bioinformatics/bty262

Mahlich, Yannick; Steinegger, Martin; Rost, Burkhard; Bromberg, Yana (June 2018, Bioinformatics)

Abstract MotivationThe rapid drop in sequencing costs has produced many more (predicted) protein sequences than can feasibly be functionally annotated with wet-lab experiments. Thus, many computational methods have been developed for this purpose. Most of these methods employ homology-based inference, approximated via sequence alignments, to transfer functional annotations between proteins. The increase in the number of available sequences, however, has drastically increased the search space, thus significantly slowing down alignment methods. ResultsHere we describe homology-derived functional similarity of proteins (HFSP), a novel computational method that uses results of a high-speed alignment algorithm, MMseqs2, to infer functional similarity of proteins on the basis of their alignment length and sequence identity. We show that our method is accurate (85% precision) and fast (more than 40-fold speed increase over state-of-the-art). HFSP can help correct at least a 16% error in legacy curations, even for a resource of as high quality as Swiss-Prot. These findings suggest HFSP as an ideal resource for large-scale functional annotation efforts. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less

Search for: All records